AITopics | auxiliary reward

034d7bfeace2a9a258648b16fc626298-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 06:24:03 GMT

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Industry:

Leisure & Entertainment > Sports (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

REFUEL: Exploring Sparse Features in Deep Reinforcement Learning for Fast Disease Diagnosis

Yu-Shao Peng, Kai-Fu Tang, Hsuan-Tien Lin, Edward Chang

Neural Information Processing SystemsMar-22-2026, 23:47:23 GMT

This paper proposes REFUEL, a reinforcement learning method with two techniques: reward shapingand feature rebuilding, to improve the performance of online symptom checking for disease diagnosis.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Louisiana (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(5 more...)

Industry: Health & Medicine (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

REFUEL: Exploring Sparse Features in Deep Reinforcement Learning for Fast Disease Diagnosis

Yu-Shao Peng, Kai-Fu Tang, Hsuan-Tien Lin, Edward Chang

Neural Information Processing SystemsFeb-14-2026, 06:43:27 GMT

Neural Information Processing Systems http://nips.cc/

agent, positive symptom, symptom, (14 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(10 more...)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

dc1913d422398c25c5f0b81cab94cc87-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 18:00:39 GMT

agent, auxiliary reward, side effect, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Manuals

Neural Information Processing SystemsFeb-7-2026, 06:59:35 GMT

High sample complexity has long been a challenge for RL.

machine learning, natural language, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Industry: Leisure & Entertainment > Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

Behavior Alignment via Reward Function Optimization

Neural Information Processing SystemsDec-26-2025, 12:12:27 GMT

Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task.This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outcomes and promote behaviors that are not aligned with the designer's intended goal. Although potential-based reward shaping is often suggested as a remedy, we systematically investigate settings where deploying it often significantly impairs performance. To address these issues, we introduce a new framework that uses a bi-level objective to learn \emph{behavior alignment reward functions}. These functions integrate auxiliary rewards reflecting a designer's heuristics and domain knowledge with the environment's primary rewards.

behavior alignment, name change, reward function optimization, (5 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.07)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.75)

Add feedback

Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards

Siyuan Li, Rui Wang, Minxue Tang, Chongjie Zhang

Neural Information Processing SystemsOct-3-2025, 02:47:42 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

f8b7aa3a0d349d9562b424160ad18612-Paper.pdf

Neural Information Processing SystemsAug-19-2025, 00:13:05 GMT

machine learning, natural language, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
(2 more...)

Add feedback

Avoiding Side Effects By Considering Future Tasks Victoria Krakovna

Neural Information Processing SystemsAug-16-2025, 20:01:55 GMT

Designing reward functions for a reinforcement learning agent is often a difficult task.

agent, auxiliary reward, side effect, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

GeometryZero: Improving Geometry Solving for LLM with Group Contrastive Policy Optimization

Wang, Yikun, Wang, Yibin, Wang, Dianyi, Peng, Zimian, Guo, Qipeng, Tao, Dacheng, Wang, Jiaqi

arXiv.org Artificial IntelligenceJul-1-2025

Recent advances in large language models (LLMs) have demonstrated remarkable capabilities across diverse domains, particularly in mathematical reasoning, amid which geometry problem solving remains a challenging area where auxiliary construction plays a enssential role. Existing approaches either achieve suboptimal performance or rely on massive LLMs (e.g., GPT-4o), incurring massive computational costs. We posit that reinforcement learning with verifiable reward (e.g., GRPO) offers a promising direction for training smaller models that effectively combine auxiliary construction with robust geometric reasoning. However, directly applying GRPO to geometric reasoning presents fundamental limitations due to its dependence on unconditional rewards, which leads to indiscriminate and counterproductive auxiliary constructions. To address these challenges, we propose Group Contrastive Policy Optimization (GCPO), a novel reinforcement learning framework featuring two key innovations: (1) Group Contrastive Masking, which adaptively provides positive or negative reward signals for auxiliary construction based on contextual utility, and a (2) length reward that promotes longer reasoning chains. Building on GCPO, we develop GeometryZero, a family of affordable-size geometric reasoning models that judiciously determine when to employ auxiliary construction. Our extensive empirical evaluation across popular geometric benchmarks (Geometry3K, MathVista) demonstrates that GeometryZero models consistently outperform baselines (e.g. GRPO), achieving an average improvement of 4.29% across all benchmarks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.0716

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Filters

Collaborating Authors

auxiliary reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

034d7bfeace2a9a258648b16fc626298-Paper-Conference.pdf

REFUEL: Exploring Sparse Features in Deep Reinforcement Learning for Fast Disease Diagnosis

REFUEL: Exploring Sparse Features in Deep Reinforcement Learning for Fast Disease Diagnosis

dc1913d422398c25c5f0b81cab94cc87-Paper.pdf

Manuals

Behavior Alignment via Reward Function Optimization

Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards

f8b7aa3a0d349d9562b424160ad18612-Paper.pdf

Avoiding Side Effects By Considering Future Tasks Victoria Krakovna

GeometryZero: Improving Geometry Solving for LLM with Group Contrastive Policy Optimization